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Abstract 



We present efficient numerical techniques for calculation of eigen- 
value distributions of random matrices in the bcta-cnscmblcs. We 
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' compute histograms using direct simulations on very large matrices, 

, by using tridiagonal matrices with appropriate simplifications. The 

distributions are also obtained by numerical solution of the Painleve 
II and V equations with high accuracy. For the spacings we show a 
technique based on the Prolate matrix and Richardson extrapolation, 
and we compare the distributions with the zeros of the Riemann zeta 
function. 
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+3 ; 1 Largest Eigenvalue Distributions 

In this section, the distributions of the largest eigenvalue of matrices in the 
/3-ensembles are studied. Histograms are created first by simulation, then 
^ . by solving the Painleve II nonlinear differential equation. 

S3' 

1.1 Simulation 

The Gaussian Unitary Ensemble (GUE) is defined as the Hermitian n x 
n matrices A, where the diagonal elements Xjj and the upper triangular 
elements x-f. = Uj k + ivjk are independent Gaussians with zero-mean, and 

|Var(x 3i ) = 1, 1 < j < n, 

\Var(u jk ) = Y&r(v jk ) = |, 1 < j < k < n. 

Since a sum of Gaussians is a new Gaussian, an instance of these matrices 
can be created conveniently in MATLAB: 



1 



A=randii(ii)+i*randn(n) ; 
A=(A+A')/2; 

The largest eigenvalue of this matrix is about 2yfn. To get a distribution 
that converges as n — > oo, the shifted and scaled largest eigenvalue A max is 
calculated as 

A max = n\ (A max - 2y/n) . (2) 

It is now straight-forward to compute the distribution for A^ ax by simula- 
tion: 

for ii=l: trials 

A=randn(n)+i*randn(n) ; 
A=(A+A')/2; 
lmax=max (eig (A) ) ; 

lmaxscaled=n~ (1/6)* (lmax-2*sqrt (n) ) ; 
7 Store lmax 
end 



'/, Create and plot histogram 



The problem with this technique is that the computational requirements 
and the memory requirements grow fast with n, which should be as large 
as possible in order to be a good approximation of infinity. Just storing 
the matrix A requires n 2 double-precision numbers, so on most computers 
today n has to be less than 10 4 . Furthermore, computing all the eigenvalues 
of a full Hermitian matrix requires a computing time proportional to n 3 . 
This means that it will take many days to create a smooth histogram by 
simulation, even for relatively small values of n. 

To improve upon this situation, another matrix can be studied that has 
the same eigenvalue distribution as A above. In it was shown that this 
is true for the following symmetric matrix when (3 = 2: 



Hr- 



1 

V2 



/iV(0,2) X (n-l)P 

X(n-l)/3 ^(0,2) X(n-2)/3 



X2/3 



V 



\ 



iV(0,2) xp 
Xp N(0,2)J 



(3) 



Here, N(0, 2) is a zero-mean Gaussian with variance 2, and Xd is the square- 
root of a x 2 distributed number with d degrees of freedom. Note that the 
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matrix is symmetric, so the subdiagonal and the superdiagonal are always 
equal. 

This matrix has a tridiagonal sparsity structure, and only 2n double- 
precision numbers are required to store an instance of it. The time for 
computing the largest eigenvalue is proportional to n, either using Krylov 
subspace based methods or the method of bisection [7j. 

In MATLAB, the built-in function eigs can be used, although that re- 
quires dealing with the sparse matrix structure. There is also a large amount 
of overhead in this function, which results in a relatively poor performance. 
Instead, the function maxeig is used below to compute the eigenvalues. 
This is not a built-in function in MATLAB, but it can be downloaded from 
http://www-math.mit.edu/~persson/mltrid. It is based on the method of 
bisection, and requires just two ordinary MATLAB vectors as input, corre- 
sponding to the diagonal and the subdiagonal. 

It also turns out that only the first 10n3 components of the eigenvector 
corresponding to the largest eigenvalue are significantly greater than zero. 
Therefore, the upper-left n cuto fj by n cuto fj submatrix has the same largest 
eigenvalue (or at least very close), where 

^cutoff ~ 10r*5. (4) 

Matrices of size n = 10 12 can then easily be used since the computations can 
be done on a matrix of size only 10ns = 10 5 . Also, for these large values of 
n the approximation \\ ~ n is accurate. 

A histogram of the distribution for n = 10 9 can now be created using 
the code below. 

n=le9; 

nrep=le4; 

beta=2; 

cutoff =round(10*iT (1/3) ) ; 
dl=sqrt(n-l:-l:n+l-cutoff) '/2/sqrt(n) ; 

ls=zeros(l,nrep) ; 
for ii=l:nrep 

dO=randn(cutof f , 1) / sqrt (n*beta) ; 

ls(ii)=maxeig(dO,dl) ; 
end 

ls=(ls-l)*n~(2/3)*2; 
histdistr (Is ,-7:0.2:3) 
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Normalized and Scaled Largest Eigenvalue Normalized and Scaled Largest Eigenvalue Normalized and Scaled Largest Eigenvalue 



Figure 1: Probability distribution of scaled largest eigenvalue (10 5 repeti- 
tions, n = 10 9 ) 

where the function histdistr below is used to histogram the data. It 
assumes that the histogram boxes are equidistant. 

function [xmid,H] =histdistr (Is ,x) 

dx=x(2)-x(l) ; 
H=histc (Is ,x) ; 
H=H(l:end-l) ; 
H=H/sum(H)/dx; 

xmid= (x ( 1 : end-1) +x (2 : end) ) /2 ; 

bar (xmid,H) 
grid on 

The resulting distribution is shown in Figure LB together with distribu- 
tions for (3 = 1 and (3 = 4. The plots also contain solid curves representing 
the "true solutions" (see next section). 

1.2 Painleve II 

Instead of using simulation to plot the distributions of the largest eigen- 
values, it can be computed from the solution of the Painleve II nonlinear 
differential equation [0]: 

q" = sq + 2q 3 (5) 
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with the boundary condition 



q(s) ~ Ai(s), as s — > oo. (6) 
The probability distribution /^(s), corresponding to (3 = 2, is then given by 

Ms) = ^aW, (7) 

where 

(x - s)q(x) 2 dx ) . (8) 



F 2 (s) = exp ^- J 



The distributions for /3 = 1 and (3 = 4 are the derivatives of 

F!(s) 2 = F 2 (s)e-^ <?(x) ^ (9) 

and 

/ s \ 2 ( e \ST i( x ) dx + e -| XT d:c \ 2 

*(il) = * w ( 1 )• < 10 > 

To solve this numerically using MATLAB, first rewrite © as a first-order 
system: 

This can be solved as an initial-value problem starting at s = so = suf- 
ficiently large positive number, and integrating along the negative s-axis. 
The boundary condition Q then becomes the initial values 

q(s ) = Ai(s ) r v 

g'( So ) = Ai'( So ). [U) 

Although the distributions can be computed from q(s) as a post-processing 
step, it is most convenient to add a few variables and equations to the ODE 
system. When computing ^(s), the quantity I(s) = f^°(x — s)q(x) 2 dx is 
required. Differentiate this twice to get 

/•CO 

I'( s ) = - q(x) 2 dx (13) 
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and 



(14) 



Add these equations and the variables I(s),I'(s) to the ODE system, and 
the solver will do the integration. This is not only easier and gives less code, 
it will also give a much more accurate solution since the same tolerance 
requirements are imposed on I(s) as on the solution q(s). 

In a similar way, the quantity J(s) = j s q(x) dx is needed when com- 
puting F\(s) and F±(s). This is handled by adding the variable J(s) and 
the equation J'(s) = —q(s). 

The final system now has the form 



d_ 

Ts 



(q\ 

q' 
I 
V 

\JJ 



( q' \ 

sq + 2q 3 

r 

V -q J 



(15) 



with the initial condition 



(q{so)\ 




q'(so) 








I'(so) 





V 



Ai(a ) 
Ai'(ao) 

- s )Ai(x) 2 dx 
Ai(s ) 2 

C Ai ( x ) dx 



(16) 



This problem can be solved in just a few lines of MATLAB code using the 
built-in Runge-Kutta based ODE solver ode45. First define the system of 
equations as an inline function 

deq=inline(' [y(2); s*y (l)+2*y (1) "3; y(4) ; y(l)~2; -y(l)] ' , >s> , >y>) ; 

Next specify the integration interval and the desired output times. 

s0=5; 
sn=-8 ; 

sspan=linspace(s0,sn,1000) ; 

The initial values can be computed as 

y0= [airy (sO) ; airy(l,sO); ... 

quadl (inline ( ' (x-sO) .*airy(x) .~2' , 'x' , 'sO') , sO ,20 , le-25 ,0, sO) ; . 
airy(s0)"2; quadl (inline (' airy (x) ') ,s0,20,le-18)] ; 
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where the quadl function is used to numerically approximate the integrals 
in H16j) . Now, the integration tolerances can be set and the system inte- 
grated: 

opts=odeset ( ' reltol ' , le-13 , ' abstol ' , le-15) ; 
[s,y]=ode45(deq,sspan,y0,opts) ; 

The five dependent variables are now in the columns of the MATLAB vari- 
able y. Using these, F 2 (s), Fi(s), and F^s) become 

F2=exp(-y(: ,3)) ; 
Fl=sqrt(F2.*exp(-y(: ,5))) ; 

F4=sqrt(F2) .*(exp(y(: ,5)/2)+exp(-y( : ,5)/2))/2; 
s4=s/2~(2/3) ; 

The probability distributions f 2 (s), f\(s), and fi(s) could be computed by 
numerical differentiation: 

f2=gradient(F2,s) ; 
f l=gradient (Fl , s) ; 
f 4=gradient (F4 , s4) ; 

but it is more accurate to first do the differentiation symbolically: 

f 2 (s) = -l'(s)F 2 (s) (17) 

A 00 = W) + Q(s)F 2 (s)) e~^) (18) 

2F 1 (s) 

U(a) = — ^ (f 2 (s) h + e J W + e~ J ^) + F 2 (s)q(s) (e" J W - e J ^ 

2a4F 4 (s) v v J v 

(19) 

and evaluate these expressions: 
f2=-y(: ,4) .*F2; 

fl=l/2./Fl.*(f2+y(: ,1) -*F2) .*exp(-y(: ,5)) ; 
f4=l/2~(l/3)/4./F4.*(f2.*(2+exp(y(: ,5) )+exp(-y( : ,5)))+ . . . 

F2.*y(: ,1) .*(exp(-y(: ,5))-exp(y(: ,5)))) ; 

Finally, plot the curves: 

plot(s,f I,s,f2,s4,f4) 

legend C\beta=l' , '\beta=2' , '\beta=4') 

xlabel('s') 

ylabeK 'f _\beta(s) ' , 'rotation' ,0) 
grid on 

The result can be seen in Figure [U 
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Figure 2: The probability distributions /i(s), f2(s), and /4(a), computed 
using the Painleve II solution. 
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2 Eigenvalue Spacings Distributions 



Another quantity with an interesting probability distribution is the spacings 
of the eigenvalues of random matrices. It turns out that the eigenvalues are 
almost uniformly distributed, which means that every random matrix gives a 
large number of spacings. The distributions can then be efficiently computed 
by simulation. 

Two other methods are used to compute the spacings distribution - the 
solution of the Painleve V nonlinear differential equation and the eigenvalues 
of the Prolate matrix. Finally, the results are compared with the spacings 
of the zeros along the critical line of the Riemann zeta function. 

2.1 Simulation 

As before, the simulations are made with matrices from the Gaussian Uni- 
tary Ensemble. The normalized spacings of the eigenvalues Ai < A2 < • • • < 
Xn are computed according to 

S' k = Xk+ l~ Xk ^n-Xl (20) 

where (3 = 2 for the GUE. The distribution of the eigenvalues is almost 
uniform, with a slight deviation at the two ends of the spectrum. Therefore, 
only half of the eigenvalues are used, and one quarter of the eigenvalues at 
each end is discarded. 

Again, to allow for a more efficient simulation, the tridiagonal matrix © 
is used instead of the full Hermitian matrix. In this case, all the eigenvalues 
are computed, which can be done in a time proportional to n 2 . While this 
could in principle be done using the MATLAB sparse matrix structure and 
the eigs function, the more efficient trideig function is used below to 
compute all the eigenvalues of a symmetric tridiagonal matrix. It can be 
downloaded from http://www-math.mit.edu/~persson/mltrid. 

The histogram can now be computed by simulation with the following 
lines of code. Note that the function chi2rnd from the Statistics Toolbox 
is required. 

n=1000; 

nrep=1000; 

beta=2; 

ds=zeros ( 1 , nrep*n/2) ; 
for ii=l:nrep 

l=trideig(randn(n, 1) , sqrt (chi2rnd( (n-1 : -1 : 1) ' *beta)/2)) ; 



9 



Random Matrix Eigenvalue Distribution 

1 i 1 1 1 1 




Normalized Consecutive spacings 



Figure 3: Probability distribution of consecutive spacings of random matrix 
eigenvalues (1000 repetitions, n = 1000) 

d=dif f (1 (n/4 : 3*n/4) ) /beta/pi . *sqrt (2*beta*n-l (n/4 : 3*n/4-l) .'2); 
ds((ii-l)*n/2+l: ii*n/2)=d; 
end 

histdistr(ds, 0:0. 05:5) ; 

The resulting histogram can be found in Figure 01 The figure also shows the 
expected curve as a solid line. 

2.2 Painleve V 

The probability distribution p(s) for the eigenvalue spacings when (3 = 2 
can be computed with the solution to the Painleve V nonlinear differential 
equation (see •"> for details): 

{ta"f + 4(ta' - a) {to 1 - a + (a') 2 ) = (21) 
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with the boundary condition 



a(t) « -- - (-} , as t -> 0+ (22) 

7T V 7T / 
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p(s) = ^E(s) (23) 



£(s) = exp ( / ^-dt ) . (24) 



Then p(s) is given by 



where 



Explicit differentiation gives 

p(s) = -j (irsa'^s) - cr(vrs) + o-(vrs) 2 ) E(s). (25) 

The second-order differential equation (|21j) can be written as a first-order 
system of differential equations: 

dt (?) = (-W(v-ta>) (ta' - a + (a') 2 )) ' (26) 

This is solved as an initial- value problem starting at t = to =very small 
positive number. The value t = has to be avoided because of the division 
by t in the system of equations. This is not a problem, since the boundary 
condition (12 2 H provides an accurate value for a{to) (as well as E{to/n)). The 
boundary conditions for the system ()26|) then become 

M*oi --*-£>' (27) 



7T 7T 



|/(*o) 

To be able to compute -E(s) using (|2*i|) . the variable 



I{t)= t^p-dt> (28) 

is added to the system, as well as the equation jj^I = j. The corresponding 
initial value is 
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Putting it all together, the final system is 

-IV^-^O (to* -e + icj'f) I (30) 







ft 









a' 



with boundary condition 

TV \ TV J 



v 



r l _ So 

7T 7^ 

to _ *0 

TV It? 1 



(31) 



This system is defined as an inline function in MATLAB: 

desig=inline( [' [y(2) ; -2/t*sqrt((y(l)-t*y(2))*' ... 

'(t*y(2)-y(l)+y(2)~2)); y(l)/t] '] , 't' , 'y') ; 

Specify the integration interval and the desired output times: 

t0=le-12; 
tn=16; 

tspan=linspace(tO,tn, 1000) ; 
Set the initial condition: 

y0=[-t0/pi-(t0/pi)~2; -l/pi-2*t0/pi ; -t0/pi-t0~2/2/pi~2] ; 

Finally, set the integration tolerances and call the solver: 

opts=odeset ( ' reltol ' , le-13 , ' abstol ' , le-14) ; 
[t ,y] =ode45(desig,t span, yO, opts) ; 

The solution components are now in the columns of y. Use these to evaluate 
E(s) and p(s): 

s=t/pi ; 

E=exp(y ( : ,3) ) ; 

p=l./s.~2.*E.*(t.*y(: ,2)-y(: ,l)+y(: ,1) • ~2) ; 
p(l)=2*s(l); '/, Fix due to cancellation 

A plot of p(s) can be made with the command 

plot(s,p) 
grid on 

and it can be seen in Figure |3J Plots are also shown of E(s) and cr(t). 
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Figure 4: Painleve V (left), E(s) and p(s) (right). 



2.3 The Prolate Matrix 

Another method to calculate the distribution of the eigenvalue spacings is 
to compute the eigenvalues Aj of the operator 



Then E(2t) = Yl^l — Aj), and p(s) can be computed as before. To do this, 
first define the infinite symmetric Prolate matrix: 



with ao = 2w, 0^ = {sm.2irwk) / irk for k = 1,2,..., and < w < ^- A 
discretization of Q(x,y) is achieved by setting w = t/n and extracting the 
upper-left n x n submatrix A n of Aqq. 

Below, the full matrix A n is used, and all the eigenvalues are computed 
in n 3 time using the eig function. However, A n commutes with the following 
symmetric tridiagonal matrix [I], and therefore has the same eigenvectors: 




(32) 



A 




(33) 



Pi a 2 fa 



(34) 



V 



0n-2 «n-l (3 n -l 
Pn-1 a n j 
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where 



{ 



Ok = ( 



0k = 



2&(n — k). 



cos 2ttw 



(35) 



It is then in principle possible to use the new techniques described in to 
compute all the eigenvalues and eigenvectors of T n in n 2 time, and then get 
the eigenvalues of A n by dot products. This is not done in this example. 

The code for computing E(s) is shown below. This time, p(s) is evaluated 
by numerical differentiation since no information about the derivative of 
E(s) is available. 

s=0:0.01:5; 
n=100; 

E0=zeros(size(s)) ; 
for ii=l : length(s) 

Q=gallery ( 'prolate ' ,n,s(ii)/2/n) ; 

E0(ii)=prod(l-eig(Q)) ; 



pO=gradient (gradient (E0 , s) , s) ; 

To improve the accuracy in E(s), Richardson extrapolation can be used. 
This is done as follows, where the values are assumed to converge as 1/n 2 : 

7 ... Compute s and E using Painleve V in previous section 

Es=zeros (length(t) ,0) ; 
El=zeros (size (s) ) ; 
for n=20*2.~(0:3) 
for ii=l : length(s) 

Q=gallery(' prolate' ,n, s (ii) /2/n) ; 



Es=[Es,El] ; 
end 

for ii=l:3 

max(abs (Es-E( : ,ones(l,size(Es,2))))) 

Es=Es ( : , 2 : end) +dif f (Es , 1 , 2) / (2* (ii+1) -1) ; 
end 

max(abs(Es-E)) 

The errors maxo< s <5 \Ei(s) — E(s)\ are shown in Table 1, for n = 20, 40, 80, 
and 160. The error after all extrapolations is of the same order as the "exact 
solution" using Painleve V. 



end 



El(ii)=prod(l-eig(Q)) ; 



end 
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N 


Error 


Error 1 


Error 2 


Error 3 


20 


0.2244 








40 


0.0561 


0.7701 






80 


0.0140 


0.0483 


0.5486 




160 


0.0035 


0.0032 


0.0323 


2.2673 




•10~ 3 


•io- 7 


•io- 8 


•io- n 



Table 1: Difference between Prolate solution E\(s) and Painleve V solution 
E(s) after 0, 1, 2, and 3 Richardson extrapolations. 

2.4 Riemann Zeta Zeros 

It has been observed that the zeros of the Riemann zeta function along the 
critical line Re(z) = \ behave similar to the eigenvalues of random matrices 
in the GUE. Here, the distribution of the scaled spacings of the zeros is 
compared to the corresponding distribution for eigenvalues computed using 
the Painleve V equation from the previous chapters. 
Define the nth zero 7 n = n th as 

C Q + *7») = 0, < 71 < 72 < • • • (36) 
Compute a normalized spacing: 

In 

In = ! = In ■ 

av spacing near 7„ 

Zeros of the Riemann zeta function can be downloaded from 3 j. Assum- 
ing that the MATLAB variable gamma contains the zeros, and the variable 
offset the offset, these two lines compute the consecutive spacings 7 n +i— 7n 
and plot the histogram: 

delta=dif f (gamma) /2/pi . *log( (gamma (1 : end-l)+of f set)/2/pi) ; 
histdistr(delta, 0:0. 05:5.0) ; 

The result can be found in Figure along with the Painleve V distribution. 
The curves are indeed in good agreement, although the number of samples 
here is a little to low to get a perfect match. 



log7 n /2vr 



2tt 



(37) 
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Riemann Zeta Zero Distribution 

1 i 1 1 1 1 — 




Normalized Consecutive spacings 



Figure 5: Probability distribution of consecutive spacings of Riemann zeta 
zeros (30,000 zeros, n « 10 12 , 10 21 , 10 22 ) 
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